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Twelve complete genomes of three novel coronaviruses—bat coronavirus HKU4 (bat-CoV HKU4), bat-CoV 
HKUS (putative group 2c), and bat-CoV HKU9 (putative group 2d)—were sequenced. Comparative genome 
analysis showed that the various open reading frames (ORFs) of the genomes of the three coronaviruses had 
significantly higher amino acid identities to those of other group 2 coronaviruses than group 1 and 3 
coronaviruses. Phylogenetic trees constructed using chymotrypsin-like protease, RNA-dependent RNA poly- 
merase, helicase, spike, and nucleocapsid all showed that the group 2a and 2b and putative group 2c and 2d 
coronaviruses are more closely related to each other than to group 1 and 3 coronaviruses. Unique genomic 
features distinguishing between these four subgroups, including the number of papain-like proteases, the 
presence or absence of hemagglutinin esterase, small ORFs between the membrane and nucleocapsid genes 
and ORFs (NS7a and NS7b), bulged stem-loop and pseudoknot structures downstream of the nucleocapsid 
gene, transcription regulatory sequence, and ribosomal recognition signal for the envelope gene, were also 
observed. This is the first time that NS7a and NS7b downstream of the nucleocapsid gene has been found in 
a group 2 coronavirus. The high Ka/Ks ratio of NS7a and NS7b in bat-CoV HKU9 implies that these two group 
2d-specific genes are under high selective pressure and hence are rapidly evolving. The four subgroups of group 
2 coronaviruses probably originated from a common ancestor. Further molecular epidemiological studies on 
coronaviruses in the bats of other countries, as well as in other animals, and complete genome sequencing will 


shed more light on coronavirus diversity and their evolutionary histories. 


Coronaviruses are found in a wide variety of animals and can 
cause respiratory, enteric, hepatic, and neurological diseases of 
varying severity. Based on genotypic and serological character- 
ization, coronaviruses were divided into three distinct groups 
(3, 12, 36). As a result of the unique mechanism of viral 
replication, coronaviruses have a high frequency of recombi- 
nation (12). Their tendency for recombination and high muta- 
tion rates may allow them to adapt to new hosts and ecological 
niches (8, 33). 

The recent severe acute respiratory syndrome (SARS) epi- 
demic, the discovery of SARS coronavirus (SARS-CoV), and 
identification of SARS-CoV-like viruses from Himalayan palm 
civets and a raccoon dog from wild live markets in China have 
boosted interest in the discovery of novel coronaviruses in both 
humans and animals (6, 17, 19, 21, 31). In 2004, a novel group 
1 human coronavirus, human coronavirus NL63 (HCoV- 
NL63), was reported independently by two groups (5, 27). In 
2005, we described the discovery, complete genome sequence, 
clinical features, and molecular epidemiology of another novel 
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group 2 human coronavirus, coronavirus HKU1 (CoV-HKU1) 
(14, 29, 32). Recently, we have also described the discovery of 
SARS-CoV-like virus in Chinese horseshoe bats and a novel 
group 1 coronavirus in large bent-winged bats, lesser bent- 
winged bats, and Japanese long-winged bats in Hong Kong (13, 
20). SARS-CoV-like viruses have also been identified in horse- 
shoe bats in other provinces of China (15). Based on these 
findings, a territory-wide molecular surveillance study was con- 
ducted to examine the diversity of coronaviruses in bats of our 
locality, and in this search six novel coronavirus species were 
discovered (30). From phylogenetic analysis of the RNA-de- 
pendent RNA polymerase (pol) and helicase genes, two of the 
viruses, bat coronavirus HKU4 (bat-CoV HKU4) and bat coro- 
navirus HKUS5 (bat-CoV HKUS), seemed to form a distinct 
subgroup in group 2 coronavirus. 

In the present study, we extended our survey to include 
specimens of bats in the Guangdong province of Southern 
China where the SARS epidemic originated and wet-markets 
and game food restaurants serving bat dishes are commonly 
found (34). Five different coronaviruses were identified, in- 
cluding two previously undescribed coronavirus species: bat 
coronavirus HKU9 (bat-CoV HKU9) and bat coronavirus 
HKU10 (bat-CoV HKU10). In addition, we sequenced four 
complete genomes each of the two putative group 2c corona- 
viruses (bat-CoV HKU4 and bat-CoV HKUS) we discovered 
in Hong Kong (30) and the putative group 2d coronavirus 
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TABLE 1. Bat species captured and associated coronaviruses in the present surveillance study 


Bat 
Scienti No. of bats No. (%) of bats positive Coronavirus(es) ()* 
cientific name Common name : 
tested for coronaviruses 

Hipposideros larvatus Intermediate roundleaf bat 2 0 (0) 

Hipposideros armiger Great roundleaf bat 26 0 (0) 

Hipposideros pomona Pomona roundleaf bat 1 0 (0) 

Miniopterus magnater Greater bent-winged bat 14 0 (0) 

Miniopterus pusillus Lesser bent-winged bat 13 2 (15) Bat-CoV HKU8 

Myotis ricketti Rickett’s big-footed bat 1 0 (0) 

Rhinolophus osgoodi Osgood’s horseshoe bat 1 0 (0) 

Rhinolophus pusillus Least horseshoe bat 12 0 (0) 

Rhinolophus affinus Intermediate horseshoe bat 25 0 (0) 

Rhinolophus sinicus Chinese horseshoe bat 64 7 (11) Bat-CoV HKU2 (6), Bat-SARS-CoV 
HKU3 (1) 

Rousettus lechenaulti Leschenault’s rousette 350 43 (12%) Bat-CoV HKU9 (42), Bat-CoV 
HKU10 (1) 


“n, number of bats positive for indicated virus. 


(bat-CoV HKU9) discovered in the present study and com- 
pared the 12 genomes with those of other coronaviruses. Based 
on the results of the present study, we propose two novel 
subgroups, group 2c and group 2d, among group 2 coronavi- 
ruses. 


MATERIALS AND METHODS 


Sample collection. A total of 509 bats (11 different species) were captured 
from various locations in the Guangdong province of Southern China over a 
7-month period (October 2005 to April 2006). Respiratory and alimentary spec- 
imens were collected by procedures described previously (13, 35). 

RNA extraction. Viral RNA was extracted from the respiratory and alimentary 
specimens by using QIAamp viral RNA minikit (QIAGEN, Hilden, Germany). 
The RNA was eluted in 50 wl of AVE buffer and was used as the template for 
reverse transcription-PCR (RT-PCR). 

RT-PCR of pol gene of coronaviruses using conserved primers and DNA 
sequencing. Coronavirus screening was performed by amplifying a 440-bp frag- 
ment of the pol gene of coronaviruses using the conserved primers (5'-GGTTG 
GGACTATCCTAAGTGTGA-3’ and 5’-CCATCATCAGATAGAATCATCA 
TA-3') designed by multiple alignments of the nucleotide sequences of available 
pol genes of known coronaviruses (29). RT was performed by using a SuperScript 
II kit (Invitrogen, San Diego, CA). The PCR mixture (25 yl) contained cDNA, 
PCR buffer (10 mM Tris-HCl [pH 8.3], 50 mM KCl, 3 mM MgCh, and 0.01% 
gelatin), 200 .M concentrations of each deoxynucleoside triphosphate, and 1.0 
U of Tag polymerase (Applied Biosystems, Foster City, CA). The mixtures were 
amplified in 60 cycles of 94°C for 1 min, 48°C for 1 min, and 72°C for 1 min and 
a final extension at 72°C for 10 min in an automated thermal cycler (Applied 
Biosystems). Standard precautions were taken to avoid PCR contamination, and 
no false-positive was observed in negative controls. 

The PCR products were gel purified by using a QIAquick gel extraction kit 
(QIAGEN). Both strands of the PCR products were sequenced twice with an 
ABI Prism 3700 DNA analyzer (Applied Biosystems) using the two PCR prim- 
ers. The sequences of the PCR products were compared to known sequences of 
the pol genes of coronaviruses in the GenBank database. 

Viral culture. Two of the samples positive for bat-CoV HKU9 and the sample 
positive for bat-CoV HKU10 were cultured in LLC-Mk2 (rhesus monkey kid- 
ney), MRC-5 (human lung fibroblast), FRhK-4 (rhesus monkey kidney), Huh-7.5 
(human hepatoma), Vero E6 (African green monkey kidney), and HRT-18 
(colorectal adenocarcinoma) cells. 

Complete genome sequencing. Twelve complete genomes of bat-CoV HKU4 
(30), bat-CoV HKUS (30), and the novel bat coronavirus discovered in the 
present study (bat-CoV HKU9) were amplified and sequenced using the RNA 
extracted from the alimentary specimens as templates. The RNA was converted 
to cDNA by a combined random-priming and oligo(dT) priming strategy. Since 
the initial results revealed that these coronaviruses were group 2 coronaviruses, 
the cDNA was amplified by degenerate primers designed by multiple alignment 
of the genomes of CoV-HKU1 (GenBank accession no. NC_006577), murine 
hepatitis virus (GenBank accession no. NC_006852), human coronavirus OC43 


(GenBank accession no. NC_005147), bovine coronavirus (GenBank accession 
no. NC_003045), rat sialodacryoadenitis coronavirus (GenBank accession no. 
AF207551), equine coronavirus NC99 (GenBank accession no. AY316300), por- 
cine hemagglutinating encephalomyelitis virus (GenBank accession no. 
NC_007732), SARS-CoV (GenBank accession no. NC_004718), and bat-SARS- 
CoV HKU3 (GenBank accession no. DQ022305) and additional primers de- 
signed from the results of the first and subsequent rounds of sequencing. These 
primer sequences are available on request. The 5’ ends of the viral genomes were 
confirmed by rapid amplification of cDNA ends using a 5'/3’ RACE kit (Roche, 
Germany). Sequences were assembled and manually edited to produce final 
sequences of the viral genomes. 

Genome analysis. The nucleotide sequences of the genomes and the deduced 
amino acid sequences of the open reading frames (ORFs) were compared to 
those of other coronaviruses. Phylogenetic tree construction was performed by 
using the neighbor-joining method with CLUSTAL X 1.83. Protein family anal- 
ysis was performed by using PFAM and InterProScan (1, 2). Prediction of 
transmembrane domains was performed by using TMpred and TMHMM (9, 23). 

Estimation of synonymous and nonsynonymous substitution rates. The num- 
ber of synonymous substitutions per synonymous site (Ks) and the number of 
nonsynonymous substitutions per nonsynonymous site (Ka) for each coding 
region between each pair of strains were calculated by using the Nei-Gojobori 
method (Jukes-Cantor) in MEGA 3.1 (11). Since the sequences of three of the 
four genomes of bat-CoV HKU4 are almost identical and the sequences of three 
of the four genomes of bat-CoV HKU5 are almost identical, the Ka/Ks ratios for 
the coding regions in bat-CoV HKU4 and bat-CoV HKUS were each calculated 
using one of these three genomes and the remaining genome that possessed 
more differences. For the four strains of bat-CoV HKU9, six pairwise compar- 
isons were performed for each coding region. 

Nucleotide sequence accession numbers. The nucleotide sequences of the 12 
genomes of bat-CoV HKU4, bat-CoV HKUS, and bat-CoV HKU9 have been 
submitted to the GenBank sequence database under accession numbers 
EF065505 to EF065516. 


RESULTS 


Bat surveillance and identification of two novel coronavi- 
ruses. A total of 1,018 respiratory and alimentary specimens 
from 509 bats of 11 different species were obtained in the 
Guangdong province in Southern China (Table 1). RT-PCR 
analyses for a 440-bp fragment in the pol genes of coronavi- 
ruses were positive in alimentary specimens from 52 (10.2%) 
and in a respiratory specimen from 1 (0.2%) of 509 bats. 
Sequencing results suggested the presence of five different 
coronaviruses (Table 1 and Fig. 1). The sequences of two 
samples from lesser bent-winged bat (Miniopterus pusillus) pos- 
sessed >97% nucleotide identities to a group 1 coronavirus 
(bat-CoV HKU8) that we described recently from lesser bent- 
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FIG. 1. Phylogenetic analysis of amino acid sequences of the 393-bp fragment of RNA-dependent RNA polymerase of coronaviruses identified 
from bats in the present study. The tree was constructed by the neighbor-joining method using the Jukes-Cantor correction and bootstrap values 
calculated from 1,000 trees. The scale bar indicates the estimated number of substitutions per 50 amino acids. Coronaviruses identified in the 
present study are shown in boldface. Coronaviruses from bats are shaded in gray. HCoV-229E (NC_002645); PEDV, porcine epidemic diarrhea 
virus (NC_003436); TGEV(NC_002306); FIPV (AY994055); HCoV-NL63 NL63 (NC_005831); bat-CoV HKU2 (DQ249235), HKU4 
(DQ074652), HKUS5 (DQ249219), HKU6 (DQ249224), HKU7 (DQ249226), and HKU8 (DQ249228); CoV-HKU1 (NC_006577); HCoV-OC43 
(NC _| 005147); MHYV, murine hepatitis virus (NC_006852); BCoV, bovine coronavirus (NC_ 003045); PHEV, porcine hemagglutinating encepha- 
lomyelitis virus (NC_007732); SDAV; SARS-CoV (human), human SARS coronavirus (NC_004718); SARS-CoV (Civet), civet SARS-like 
coronavirus (AY304488); bat-SARS-CoV HKU3, bat-SARS-like coronavirus HKU3 (DQ022305); IBV, infectious bronchitis virus (NC_001451); 
TCoV, turkey coronavirus (AF124991); IBV-like, IBV isolated from peafowl (AY641576). Other abbreviations are as defined in the text. 


winged bats in Hong Kong (30), those of six alimentary spec- 
imens and one respiratory specimen (obtained from one of the 
six bats with positive alimentary specimens) from Chinese 
horseshoe bat (Rhinolophus sinicus) possessed >97% nucleo- 
tide identities to another group 1 coronavirus (bat-CoV 
HKU2) that we described recently from Chinese horseshoe 
bats in Hong Kong (30), and that of one sample from a Chi- 
nese horseshoe bat (Rhinolophus sinicus) possessed >98% nu- 
cleotide identities to bat-SARS-CoV HKU3 that we described 
recently from Chinese horseshoe bats in Hong Kong (13). The 
sequences of 42 samples from Leschenault’s rousette bats 
(Rousettus lechenaulti) had <70% nucleotide identities to all 


known coronaviruses, suggesting a novel group 2 coronavirus 
(bat-CoV HKU9); that of one sample from a Leschenault’s 
rousette bat (Rousettus lechenaulti) had <80% nucleotide 
identities to all known coronaviruses, suggesting a novel group 
1 coronavirus (bat-CoV HKU10). 

Viral culture. No cytopathic effect was observed in any of the 
cell lines inoculated with bat specimens positive for bat-CoV 
HKU9 and bat-CoV HKU10. Quantitative RT-PCR using the 
culture supernatants and cell lysates for monitoring the pres- 
ence of viral replication also showed negative results. 

Genome organization and coding potential of bat-CoV 
HKU4, bat-CoV HKUS, and bat-CoV HKU9. Since analysis of 


X43SSNS JO AINN Aq SLoz ‘6 IsNnBny uo /BiowseIAl//:dyjy wos pepeojuMoG 


1577 


GROUP 2c AND 2d CORONAVIRUS GENOMES 


VOL. 81, 2007 


Downloaded from http://jvi.asm.org/ on August 9, 2015 by UNIV OF SUSSEX 


TABLE 2. Comparison of genomic features of bat-CoV-HKU4, bat-CoV HKUS, bat-CoV HKU, and other coronaviruses and amino acid identities between the predicted 
chymotrypsin-like protease (3CL?"°), RNA-dependent RNA polymerase (Pol), helicase (Hel), spike (S), envelope (E), membrane (M), and nucleocapsid (N) 
proteins of bat-CoV-HKU4 and bat-CoV HKUS5 and the corresponding proteins of other coronaviruses* 


Coronavirus 


Genome features 


Pairwise amino acid identity (%) 


Bat-CoV HKU4 


Bat-CoV HKUS5 


Bat-CoV HKU9 


Size G+C 
(bases) content 3¢7pm pol Hel Ss E M N  3CLPr° Pol_~—s Hel Ss E M N  3cCLPre Pol_~—s Hel Ss E M N 

Group 1 

HCoV-229E 27,317 0.38 48.2 588 621 244 264 328 245 492 580 626 25.2 282 329 270 42.7 56.7 60.0 26.2 165 312 19.7 

PEDV 28,033 0.42 46.7 60.1 61.9 246 200 378 242 482 595 628 238 202 355 235 445 592 59.5 23.2 15.2 33.0 24.0 

TGEV 28,586 0.38 48.4 59.7 61.3 25.7 225 32.7 289 474 596 619 246 20.2 31.7 256 442 579 61.3 230 149 272 258 

FIPV 29,355 0.38 49.0 597 611 273 239 311 295 480 598 618 263 216 313 274 42.2 582 611 22.2 15.1 270 25.6 

HCoV-NL63 27,553 0.34 494 583 618 25.1 188 322 27.7 487 581 626 25.7 186 338 254 44.0 57.7 60.5 258 136 291 221 
Group 2a 

CoV-HKU1 29,926 0.32 51.6 67.7 66.0 31.7 265 446 31.7 520 682 658 300 241 44.9 31.1 47.7 66.5 65.7 29.9 24.7 39.5 27.7 

HCoV-OC43 30,738 0.37 516 687 678 321 273 424 320 516 688 67.7 30.8 30.7 420 333 484 666 675 291 274 40.7 312 

MHV 31,357 0.42 53.3 67.9 66.2 30.9 220 43.2 333 539 681 658 30.2 272 42.2 342 510 65.2 67.1 28.5 225 416 29.6 

BCoV 31,028 0.37 52.0 686 67.5 32.2 26.7 446 32.2 516 687 673 31.2 256 435 342 484 665 67.5 286 274 42.7 31.2 

PHEV 30,480 0.37 523 68.7 67.7 322 278 446 314 520 688 675 30.55 26.7 426 320 48.0 66.7 67.7 29.2 286 41.5 29.7 
Group 2b 

SARS-CoV 29,751 0.41 50.0 71.7 70.7 324 390 432 440 511 718 71.7 31.9 349 431 43.2 520 721 734 318 293 43.3 39.2 

Bat-SARS-CoV HKU3 29,728 0.41 50.3. 718 705 32.7 390 43.2 444 514 717 715 31.7 34.9 425 436 520 719 73.6 32.2 293 433 39.9 
Group 2c 

Bat-CoV HKU4 30,286 0.38 83.7 922 935 66.9 79.3 82.7 744 S11 69.4 72.9 29.7 198 423 37.2 

Bat-CoV HKUS 30,488 0.43 83.7 922 93.5 66.9 79.3 82.7 74.4 50.6 690 73.0 305 23.3 43.7 35.1 
Group 2d 

Bat-CoV HKU9 29,114 0.41 51.1 69.4 729 298 196 426 372 50.6 690 73.0 30.7 231 44.0 35.1 
Group 3 

IBV 27,608 0.38 40.6 61.0 578 25.6 193 289 264 385 59.6 585 23.0 168 275 286 366 61.9 61.0 268 20.0 308 25.7 


* Abbreviations are as defined in the text and figure legends. 
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FIG. 2. Genome organizations of bat-CoV HKU4, bat-CoV HKU5, bat-CoV HKU39, and representative coronaviruses from each group. 
Papain-like proteases (PL1, PL2, and PL) and the nonstructural proteins are represented by white boxes. Hemagglutinin esterase (HE), spike (S), 
envelope (E), membrane (M), and nucleocapsid (N) are represented by gray boxes. 


the 440-bp fragment of the pol gene of bat-CoV HKU9 sug- 
gests a distinct subgroup in group 2 coronavirus and our pre- 
vious findings suggest that bat-CoV HKU4 and bat-CoV 
HKUS represent another distinct subgroup of group 2 corona- 
virus, complete genome sequence data of four strains each of 
bat-CoV HKU4, bat-CoV HKUS, and bat-CoV HKU9 were 
obtained by assembly of the sequences of the RT-PCR prod- 
ucts from the corresponding individual specimens. 

The sizes of the genomes of bat-CoV HKU4, bat-CoV 
HKUS, and bat-CoV HKU9 are 30,286 to 30,316 bases, 30,482 
to 30,488 bases, and 29,017 to 29,155 bases, respectively, and 
their G+C contents are 38, 43, and 41% (Table 2). Their 
genome organizations are similar to those of other coronavi- 
ruses, with the characteristic gene order: 5’-replicase ORF lab, 
spike (S), envelope (E), membrane (M), and nucleocapsid 
(N)-3’ (Fig. 2 and Table 3). Both 5’ and 3’ ends contain short 
untranslated regions. The replicase ORF lab occupies 20.8 to 
21.5 kb of the genomes (Table 3). This ORF encodes a number 
of putative proteins, including nsp3 (which contains the puta- 
tive papain-like protease [PL?"°]), nsp5 (putative chymotryp- 
sin-like protease [3CL?*°]), nsp12 (putative RNA-dependent 
RNA polymerase [Pol]), nsp13 (putative helicase), and other 


proteins of unknown functions (Table 4). These proteins are 
produced by proteolytic cleavage of the large replicase 
polyprotein by PL?*°® and 3CL?*° at specific sites (Table 4). 
Bat-CoV HKU4 and bat-CoV HKUS have the same genome 
structure (Fig. 2). They also possess the same putative tran- 
scription regulatory sequence (TRS) motif, 5’-ACGAAC-3’, at 
the 3’ end of the leader sequence and precede each ORF 
except NS3c and N (Table 3). This TRS has also been shown 
to be the TRS for SARS-CoV (10). No TRS was observed 
upstream of NS3c, whereas the TRS for N is ACGAAU in all 
eight strains of bat-CoV HKU4 and bat-CoV HKUS. Similar 
to other group 2b coronaviruses, the genomes of bat-CoV 
HKU4 and bat-CoV HKUS have putative PL?*°®, which are 
homologous to PL2°"° of group 1 and group 2a and PL?*® of 
group 3 coronaviruses (Fig. 3). In the genomes of bat-CoV 
HKU4 and bat-CoV HKUS, between S and E, four ORFs that 
encode putative nonstructural proteins (NS3a, NS3b, NS3c, 
and NS3d) were observed. A BLAST search revealed no amino 
acid similarities between these four putative nonstructural pro- 
teins and other known proteins, and no functional domains 
were identified by PFAM and InterProScan. TMHMM and 
TMpred analyses showed three putative transmembrane do- 
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TABLE 3. Coding potential and putative transcription regulatory sequences of the genomes of bat-CoV HKU4, 
bat-CoV HKUS, and bat-CoV HKU9 


Putative TRS 


Coronavirus ORF ens Nea Nec oraming Frame : ae 
position) muslEGuis® acids mag ame TRS sequence 
Bat-CoV HKU4 la 267-13550 13,284 4,428 aS) 63 ACGAAC(198)AUG 
1b 13550-21625 8,076 2,692 +2, 
S 21570-25628 4,059 1,352 +3 21519 ACGAAC(45)AUG 
NS3a 25655-25930 276 91 +42 25636 ACGAAC(13)AUG 
NS3b 25948-26307 360 119 aed 25940 ACGAACUUAUG 
NS3c 26111-26968 858 285 +2 
NS3d 26984-27667 684 227 +2 26976 ACGAACUUAUG 
B 27737-27985 249 82 +2 27730 ACGAACUAUG 
M 28000-28659 660 219 at 27985 ACGAAC(9)AUG 
N 28697-29968 1,272 423 +2 28674 ACGAAU(16)AUG 
Bat-CoV HKUS5S la 260-13681 13,422 4,474 +2 61 ACGAAC(193)AUG 
1b 13681-21798 8,118 2,706 +1 
S 21725-25798 4,074 1,357 +2 21674 ACGAAC(45)AUG 
NS3a 25761-26126 366 121 +3 25807 ACGAACUUAUG 
NS3b 26139-26498 360 119 +3 26130 ACGAACUUCAUG 
NS3c 26380-27150 771 256 tell 
NS3d 27160-27831 672 223 +1. 27152 ACGAACUUAUG 
E 27909-28157 249 82 +3 27902 ACGAACUAUG 
M 28172-28834 663 220 +3 28157 ACGAAC(9)AUG 
N 28884-30167 1,284 427 +3) 28861 ACGAAU(16)AUG 
Bat-CoV HKU9 la 229-12951 12,723 4,241 +1. 71 ACGAAC(152)AUG 
1b 12951-21020 8,070 2,690 +3 
S 20974-24798 3,825 1,274 tl 20926 ACGAAC(42)AUG 
NS3 24795-25457 663 220 +3 24786 ACGAACAGUAUG 
E 25457-25696 240 79 +2 25448 UCGAACUAUAAUG 
M 25689-26357 669 222 4:3) 25662 ACGAAC(21)AUG 
N 26419-27825 1,407 468 alt 26408 ACGAACCUAUUAUG 
NS7a 27869-28426 558 185 +2 27863 ACGAACAUG 
NS7b 28433-28882 450 149 +2 28427 ACGAACAUG 


TABLE 4. Characteristics of putative nonstructural proteins of 
replicase in bat-CoV HKU4, bat-CoV HKUS, and bat-CoV HKU9 


Amino acids (first residueP°s"°"-last 


ri “i 7 position 
Putative function residue’ ) 


ns : 
P or domain® 


Bat-CoV Bat-CoV Bat-CoV 
HKU4 HKUS5 HKU9 
nsp1_ Unknown M!-G?5 M!-G"> M!-G! 
nsp2. Unknown D!99.G847_ 196. G851__176_G772 
nsp3 Putative PLP™° M848_G2784 _A852_G2829__ G773_G2609 
domain 

nsp4 Hydrophobic domain G2785_Q3291 G7830_Q7337 G2510_GQ3103 
nsp5  3CLP° §3292.Q3597_ §3338_Q364343104_()3409 


§3598_ (3889 §3644_Q3935 G3410_Q 3699 
$3890_(Q3972 $3936_Q4018 §3700_Q3782 
AXP3_.QU71 A4019_¢4217 A?783_Q3982 
N4!72_Q4281 N4?718_Q4327 N2983_ 44094 
A4282_Q4420 A%28_—4466 A4095_¢)4233 
g4421_\74434 g4467_, 4480 A4234_4248 


nsp6 Hydrophobic domain 

nsp7 Unknown 

nsp8 Unknown 

nsp9 Unknown 

nsp10 Unknown 

nspl1 Unknown (short 
peptide at the end 


of ORF1a) 
nsp12 Pol 54421_Q5354 _§4467_Q954004.4234_9)5165 
nsp13 Hel A5355_Q)5952, A 5401_(5998 g5166_(5766 
nsp14_ ExoN §5953_Q6475 5 5999_(6522_§5767_C\6296 


GH75.Q8!7 
AS8i8_p 71 19 


G73_Qo871 §9297_ 6633 
AS872_R7179 (A9634_176930 


nspl15 XendoU 
nspl6 2'-O-MT 


@ PLP*°, papain-like protease; 3CLP™, chymotrypsin-like protease; Pol, RNA- 
dependent RNA polymerase; Hel, helicase; ExoN, 3’-to-5’ exonuclease; XendoU, 
poly(U)-specific endoribonuclease and 2'-O-MT, S-adenosylmethionine-depen- 
dent 2'-O-ribose methyltransferase. 


mains in NS3d of bat-CoV HKU4 (residues 37 to 59, 71 to 90, 
and 94 to 111) and bat-CoV HKUS (residues 32 to 54, 67 to 84, 
and 89 to 108). Similar to group 2a and 2b coronaviruses, 18 to 
81 and 19 to 82 nucleotides downstream of the N genes (nu- 
cleotide positions 29986 to 30049 in bat-CoV HKU4 and nu- 
cleotide positions 30186 to 30249 in bat-CoV HKUS), the 3’ 
untranslated regions of the two genomes contain predicted 
bulged stem-loop structures (Fig. 4). Downstream of the 
bulged stem-loop structures, 77 to 126 and 78 to 129 nucleo- 
tides downstream of the N genes (nucleotide positions 30045 
to 30094 in bat-CoV HKU4 and nucleotide positions 30245 to 
30296 in bat-CoV HKUS5), pseudoknot structures are present 
(Fig. 4). 

For the genome of bat-CoV HKU%, similar to bat-CoV 
HKU4, bat-CoV HKUS, and the group 2b coronaviruses, the 
putative TRS motif, 5'-ACGAAC-3’, is also observed. This 
putative TRS is present at the 3’ end of the leader sequence 
and precedes each ORF except E, of which the putative TRS 
is UCGAAC (Table 3). Interestingly, the P1 position of the 
putative cleavage site by 3CL?"® at the junction between nsp9 
and nsp10 is occupied by histidine instead of glutamine. This 
exception was also previously observed at the junction between 
the helicase and nsp14 in CoV-HKU1 and HCoV-NL63, where 
the P1 positions are also occupied by histidine instead of glu- 
tamine (26, 28). One ORF, which encodes a putative nonstruc- 
tural protein (NS3), is observed between the S and E genes. 


X43SSNS JO AINN Aq SLoz ‘6 IsNBny uo /Bio"wseIAl//:dyjy wos pepeojuMoG 


1580 WOO ET AL. J. VIROL. 


HCoV-229E aK EN g DLNTSE-------- poet be 1663 
TGEV J {ON MIN QLISSAFDVEQ-------- KVIKAIDIMJQA 1550 
HCoVv-0C43 QHENLESEDLKAMSSSIN 1634 
MHV QYSGISAADLAAMSDAG 1679 
SARS-CoV VLP-SDDTLRSEAFR@IH 1613 
BtCov/13¥05 VABNLSESEEVVEgEyaIG 1603 
Bat-CoV HKU4 VABNLSESEKVVESENOIG 1596 
Bat-CoV HKUS5 LABNLSEAEKAVESEVEG 1629 
Bat-CoV HKU9 KOBNESSAEVTAMSERMIG 1495 


IBV KEILYVPTTIQSILE 1236 


HCoV-229E 
TGEV 
HCovV-0C43 
MHV 
SARS-CoV 
BtCoV/13¥05 
Bat-CoV HKU4 
Bat-CoV HKU5S 
Bat-CoV HKU9 
IBV 


HCoV-229E WY YVARLM BAR DIMT KigSKYLANEAQV IDAGS--FKNSHASINSAIV 1799 
TGEV YHISGVK c GDLMDNDCEI A-|NVEKFVeP----—VV 1682 
HCoV-0C43 GFK DAVUHFST 1769 
MHV SFKFN Davai T 1813 
SARS-CoV YSNKTVE efvey T 1750 
BtCov/13¥05 YEDCTFD Racy V 1742 
Bat-CoV HKU4 AY@DCTFDN=DI ReIcWYARV 1735 
Bat-CoV HKUS5 AYEDCT YEE KRICHY 1768 
Bat-CoV HKU9 KAT TDACTFYES 1631 
IBV “ISCTAKVEDFS ERICHOPVR 1372 


HCoV-229E CASVKRDGVQVG 
TGEV AAPLANHGTD-E WT = L1G------ -PIIGEVLEAT@ICYSG----SNR 1738 
HCoVv-0C43 LBREDSEIGYTV IGD-----—- K 1827 
MHV LDKSG8VKGYNI I 

SARS-CoV YDNEKTGVSI 
BtCov/13¥05 ERMOSVFNE 
Bat-CoV HKU4 NSMERNQOSVENE 
Bat-CoV HKU5 NSLDHBHATHEE 
Bat-CoV HKU9 VVLDDEYAPVSV 
IBV AGINLLHFKTQYS 


GN=S=== —YOQ 1810 


i 
VKVST--- “SEP 


HCoV-229E 
TGEV 
HCoV-0C43 
MHV 

SARS-CoV 
BtCov/13¥05 
Bat-CoV HKU4 
Bat-CoV HKU5 
Bat-CovV HKU9 
IBV 


FIG. 3. Multiple alignments of PL?*® of SARS-CoV, btCoV/133/05 (NC_008315), bat-CoV HKU4, bat-CoV HKUS, bat-CoV HKU, and IBV 
and PL2P"° of HCoV-229E, TGEV, HCoV-OC43, and MHV. Amino acids conserved across all coronaviruses are highlighted in black. Amino acids 
conserved in 60 to 90% of the coronaviruses are highlighted in gray. The conserved Cys and His amino acid residues of the catalytic dyad are 
marked with an asterisk, the conserved postulated metal-chelating Cys and His residues are marked with a “#” symbol, and the conserved aromatic 
amino acid immediately downstream of the catalytic Cys is marked with a “+” symbol. 


Notably, at the 3’ end of the genome, it contains the longest acid similarities between these three putative nonstructural 
stretch of nucleotides (1,289 bases) after the N gene among all proteins and other known proteins,, and no functional domain 
known coronaviruses with complete genomes available, where was identified by PFAM and InterProScan. TMHMM and 
two ORFs that encode putative nonstructural proteins (NS7a TMpred analysis showed three putative transmembrane do- 
and NS7b) are observed. A BLAST search revealed no amino mains in NS3 (residues 30 to 47, 54 to 76, and 80 to 99). No 
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Cc 
Bat-CoV HKU4 - sa 
A-T 
A-T 
T-A 
gf - 
G 
a, in 
T-G 
T-A 
G-T 
c-G 
G-Cc 
A 
by A 
€ 
G 
. e 
Cc 
A-T 
A-T 
c- 6G 5 
G-T 6’ 
2 A 
as-c 
G c A 
Tr _ AA 
T-A Eire 
GC A-T 
A--- T T-A 
G ---- C T-A 
A Sess T c-G 
A ------ A-T 
C wanenne GY fy A-T 
[EGAJATGCCAATGAAGAGTAA CACAGAATGGAATCATGTTA AACCCAT 
Stop codon 


of N 
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Bat-CoV HKUS5 gee 
A A 
AD = "7 
G-T, 
G-c 
T-A 
Cc T 


QOHaHH 
iy Pr DPe 
aQAQg A Pa 


QP 
> 


T-A 
A-T 
G-T 
Ce =.6 
G-T 
G-c 
Chi c ane 
c A-T 
T-A mien 
Gr C , T-A 
A--- T c-G 
G---- Cc A-T 
AoSSsss T T-G 
A ------ T T-A 
Ph cemcicerecs Gil T-A 
CACAGAATGGAATCATGT 


[rAAJATTGCCAATTGAACGTAA CCCAT 


Stop codon 
of N 


FIG. 4. Predicted bulged stem-loop and pseudoknot structures downstream of N in genomes of bat-CoV HKU4 and bat-CoV HKUS. Stop 
codons for the N genes are boxed. Broken lines indicate alternative base pairing. 


bulged stem-loop and pseudoknot structures, similar to those 
in other group 2 coronaviruses, are observed downstream to N, 
NS7a, or NS7b in the bat-CoV HKU9 genomes. 

Phylogenetic analyses. The phylogenetic trees constructed 
using the amino acid sequences of the 3CLP"°, Pol, helicase, 
S, and N of bat-CoV HKU4, bat-CoV HKUS, bat-CoV 
HKU, and other coronaviruses are shown in Fig. 5, and the 
corresponding pairwise amino acid identities are shown in 
Table 2. For all of the five genes, bat-CoV HKU4, bat-CoV 
HKUS, and bat-CoV HKU9 possess higher amino acid iden- 
tities to the homologous genes in other group 2 coronavi- 
ruses than to those of group 1 and group 3 coronaviruses 
(Table 2). In all five trees, all strains of bat-CoV HKU4, 
bat-CoV HKUS, and another strain of coronavirus recently 
described (24) were clustered together, with bootstrap val- 
ues of 1,000 in all cases, forming a distinct subgroup (Fig. 5). 
Within this subgroup, all four strains of bat-CoV HKU4 
were clustered with the strain of coronavirus recently de- 
scribed (BtCoV/133/05) (24), and all four strains of bat-CoV 
HKUS were clustered separately, forming two distinct sub- 
lineages. Furthermore, in all five trees, all strains of bat- 
CoV HKU9 were clustered together, with bootstrap values 
of 1,000 in all cases, forming another distinct subgroup (Fig. 
5). From both phylogenetic tree analysis and amino acid 
differences, the strains of bat-CoV HKU9 subgroup were 
more closely related to the group 2b coronaviruses than the 
others (Fig. 5 and Table 2). We propose two novel sub- 
groups, group 2c and group 2d, of coronavirus to describe 
these two distinct subgroups, respectively. 

Estimation of synonymous and nonsynonymous substitution 
rates. The Ka/Ks ratio for the various coding regions in bat- 
CoV HKU4, bat-CoV HKUS, and bat-CoV HKU%9 is shown in 
Table 5. For bat-CoV HKU4, the numbers of synonymous and 
nonsynonymous mutations were small. Therefore, the Ka/Ks 


ratios of the various coding regions, as, for example, the ex- 
ceptional high Ka/Ks ratios of nsp6, NS3c and N, were not 
conclusive. For bat-CoV HKUS, the Ka/Ks ratios of the vari- 
ous coding regions were small, implying that the genes were 
stably evolving. Notably, the Ka/Ks ratio for NS3c of bat-CoV 
HKUS is 0.027, which suggested that this gene is expressed and 
stably evolving. However, NS3c possesses neither TRS nor 
internal ribosomal entry site (IRES). Further experiments are 
necessary to elucidate whether NS3c is expressed and, if it is 
expressed, what signal sequence is involved for ribosomal rec- 
ognition. For bat-CoV HKU9, the mean Ka/Ks ratio of NS7a 
and 7b (0.961 and 0.529) was significantly higher than those of 
other coding regions, implying that these two genes are rapidly 
evolving. 


DISCUSSION 


Two putative new subgroups, 2c and 2d, of coronaviruses, 
are described. The four strains of bat-CoV HKU4 and the four 
strains of bat-CoV HKUS5 formed two distinct branches in the 
putative subgroup 2c lineage in all five phylogenetic trees an- 
alyzed (Fig. 5). Moreover, all strains of bat-CoV HKU4 were 
found in lesser bamboo bats, whereas all strains of bat-CoV 
HKUS were found in Japanese pipistrelle (30). These findings 
support the view that bat-CoV HKU4 and bat-CoV HKUS are 
two separate coronavirus species. Since bat-CoV HKU4 and 
bat-CoV HKUS have the same genome organization and share 
the same TRS, we speculate that these two coronaviruses orig- 
inated from the same ancestor, and their subsequent diver- 
gence into two separate species was due to the adaptation to 
different hosts and ecological niches. As for bat-CoV HKU9, 
the S and N genes showed quite marked nucleotide polymor- 
phism and amino acid sequence changes, but the amino acid 
sequences of 3CLP*°, Pol, and helicase are relatively conserved 
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‘Bat-CoV HKU5-3 
Bat-CoV HKUS-2 pat OV HKU4-1 


3CL’° 


G ¥ OV HKU4-2 
Bat-coV RRUD-#)\ atCOV HKUS-S ot COV EKUa4 G20 
{-CoV HKUS-1 Y5-1 Bat-CoV HKUS-3 
\ 
\ 


G2b 
Bat-SARS-CoV HKU3 
SARS-CoV * 


01 d f3 
——? Bat-CoV HKUS-4 
Bat-CoV HKUS-1 
Bat-CoV HKUS-3 
Bat-CoV HKUS-: 


Bat-CoV HKU5-2 
Bat-CoV HKUS-5 
Bat-CoV HKUS-3 
B at-CoV HKUS-1 G2c 


Bat-CoV HKU4-1 
BtCoV/133/2005 
Bat-CoV HKU4-3 


Bat-CoV HKU9-2 


Bat-CoV HKU9-4" 


0.4 
Bat-SARS-CoV HKU3 


J. VIROL. 


Bateey HKU3 G2c 
Ba PATRI Bibov: 1133/2005 

1 Bat-CoV HKU5-1 
Bat-Gey HRUBS 


Bat-CoV HKUS-2 


. / 1000__(oARS-cov S2P 
troy 1009 —X. Bat-SARS-CoV HKU3 


G2b 
Bat-SARS-CoV HKU: 
SARS-CoV B 


jat-CoV HKU4—4 
BtCoV/133/2005 


= Bat-CoV HKUS-2 


Bat-CoV HKUS-4 
Bat-CoV HKUS-3 


FIG. 5. Phylogenetic analysis of chymotrypsin-like protease (3CL"°), RNA-dependent RNA polymerase (Pol), helicase (Hel), spike (S), and 
nucleocapsid (N) of bat-CoV HKU4, bat-CoV HKUS, and bat-CoV HKU9. The trees were constructed by the neighbor-joining method using the 
Jukes-Cantor correction and bootstrap values calculated from 1,000 trees. We included 327, 949, 609, 1,661, and 582 amino acid positions in 3CL?™, 
Pol, helicase, S and N, respectively, in the analysis. The scale bar indicates the estimated number of substitutions per 10 amino acids. Abbreviations 


are as defined in the text or in the legend to Fig. 1. 


(Fig. 5). Furthermore, all 42 strains of bat-CoV HKU9 were 
found in the same bat species, Leschenault’s rousette. These 
findings support the view that all of the 42 strains of bat-CoV 
HKU9 belong to one coronavirus species. Complete genome 
sequencing of more bat-CoV HKU9 strains may show geno- 
types and even recombination events as in the case of CoV- 
HKUI (33). Based on phylogenetic tree analysis, although 
coronaviruses of groups 2c (bat-CoV HKU4 and bat-CoV 
HKUS) and group 2d (bat-CoV HKU9) are more closely re- 
lated to the other group 2 coronaviruses, they formed branches 
distinct from the group 2a and 2b coronaviruses. Furthermore, 
bat-CoV HKU4, bat-CoV HKUS5, and bat-CoV HKU9 of 
these two new proposed subgroups possessed additional 
genomic features different from those of other group 2 coro- 
naviruses (Table 6). For the coding potentials of the genomes, 
group 2a coronaviruses possess PL1P"° and PL2?"°, but group 
2b, 2c, and 2d coronaviruses only possess one PL?*® that is 
homologous to PL2P"°. It is noteworthy that in an article re- 


cently published, the authors mentioned that no PLP*° 
was identified in nsp3 of the genome of BtCoV/133/05 
(NC_008315, >95% overall nucleotide identities with bat-CoV 
HKU4) (24). However, after careful analysis of their nsp3 by 
multiple alignment and a search of the conserved domains and 
amino acid residues (37), it was found that PLP*° is present in 
the genome of BtCoV/133/05, with the conserved Cys and His 
residues of the catalytic dyad, conserved aromatic amino acid 
residue (Trp, Phe, or Tyr) immediately downstream to the 
catalytic Cys, and the postulated metal-chelating Cys and His 
residues of the zinc fingers (Fig. 3). The genomes of group 2a 
coronavirus, but not those of group 2b, 2c, and 2d coronavi- 
ruses, encode hemagglutinin esterase. The genomes of group 
2b coronavirus, but not those of group 2a, 2c, and 2d corona- 
viruses, contain several small ORFs between the M and N 
genes. The genomes of group 2d coronavirus, but not those of 
group 2a, 2b, and 2c coronaviruses, contain two ORFs down- 
stream of the N gene. As for the TRS, the sequence for the 
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TABLE 5. Estimation of nonsynonymous substitution and 
synonymous rates in the genomes of bat-CoV HKU4, 
bat-CoV HKUS, and bat-CoV HKU9 


Ka/Ks ratio 

Coding TT, 
region at-Co 

8 Bat-CoV HKU4 Bat-CoV HKUS5 HKU 
nspl 0.031 Ka = 0, Ks = 0.03711 0.247 
nsp2 0.133 0.061 0.131 
nsp3 0.154 0.070 0.091 
nsp4 0.155 0.045 0.066 
nsp5 Ka = 0, Ks = 0.00239 0.016 0.035 
nsp6 0.317 0.076 0.067 
nsp7 Ka = 0, Ks = 0.00904 0.066 0.020 
nsp8 Ka = 0, Ks = 0 0.011 0.025 
nsp9 Ka = 0, Ks = 0.00691 0.021 0.019 
nsp10 Ka = 0, Ks = 0 0.050 0.021 
nsp11 Ka = 0, Ks = 0 Ka = 0, Ks = 0 0.283 
nsp12 Ka = 0, Ks = 0.00163 0.003 0.027 
nsp13 Ka = 0, Ks = 0 0.009 0.011 
nsp14 Ka = 0, Ks = 0 0.007 0.028 
nsp15 Ka = 0, Ks = 0.00665 0.091 0.044 
nsp16 Ka = 0, Ks = 0 0.018 0.081 
s 0.010 0.127 0.170 
NS3 0.234 
NS3a 0.187 Ka = 0.00181, Ks = 0 
NS3b 0.308 0.201 
NS3c 1.205 0.027 
NS3d Ka = 0.00096, Ks = 0 0.166 
E Ka = 0, Ks = 0.00865 Ka = 0, Ks = 0.03392 0.108 
M Ka = 0, Ks = 0.00325 0.014 0.097 
N 0.473 0.060 0.096 
NS7a 0.961 
NS7b 0.529 


* Mean of six comparisons. 


TRS of group 2a coronaviruses is CUAAAC and that of the 
group 2b, 2c, and 2d coronaviruses is ACGAAC (10, 12, 16). 
For the E gene, TRS is present in group 2b, 2c, and 2d, but not 
2a, coronaviruses, which use IRES for their translation. The 
genomes of group 2a, 2b, and 2c coronaviruses, but not of 
group 2d coronaviruses, contain bulged stem-loop and 
pseudoknot structures downstream of the N gene. 
Coronaviruses are probably better classified into group 1 
(subgroups 1a and 1b), group 2 (subgroups 2a, 2b, 2c, and 2d), 
and group 3 than into seven groups. Traditionally, coronavi- 
ruses have been classified into groups 1, 2, and 3. When SARS- 
CoV was first identified and its genome was sequenced, it was 
proposed that it constituted a fourth group of coronavirus (17, 
21). However, after more extensive phylogenetic analyses, it 
was suggested that SARS-CoV probably represents a distant 
relative of group 2 coronaviruses, and it was subsequently 
classified as group 2b coronaviruses (4, 22). In 2005, we and 
another group in mainland China independently described ad- 
ditional members of group 2b coronaviruses (13, 15). Recently, 
we described the discovery of six novel coronaviruses from bats 
in Hong Kong (30). Phylogenetic analysis of the pol and heli- 
case genes showed that two of them, bat-CoV HKU4 and 
bat-CoV HKUS, probably represent a novel subgroup in group 
2 coronaviruses. Subsequently, another group reported similar 
diversity in coronaviruses found from bats in mainland China, 
and they proposed that coronaviruses should be classified into 
five groups, instead of groups 1, 2a, 2b, 2c, and 3 (24). In the 
present study, we discovered another distinct subgroup of 
coronaviruses (bat-CoV HKU9). We also performed complete 
genome sequencing of four strains each of bat-CoV HKU4, 
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TABLE 6. Comparison of characteristics in the genomes of group 
2a, 2b, 2c, and 2d coronaviruses 


Group 2 coronavirus 
Characteristics* 


2a 2b 2c 2d 
Coding potential 
Papain-like protease PLiPr° and =PLPre PLE PL 
PL2Pr° 
Hemagglutinin esterase + = = = 
Small ORFs between M  — + = = 
and N 
NS7a and 7b = ca - + 
downstream to N 
TRS 
TRS sequence CUAAAC ACGAAC ACGAAC ACGAAC 
TRS/IRES for E TIRES TRS TRS TRS 
Stem-loop and + + = a 


pseudoknot structures 
downstream to N 


* TRS, transcription regulatory sequence; IRES, internal ribosome entry site. 


bat-CoV HKUS, and bat-CoV HKU9. This large amount of 
genome sequence data enabled us to perform a thorough com- 
parative analysis of the genomes of the various groups of coro- 
naviruses. The results showed that the amino acid identities in 
the various ORFs among the group 2 coronaviruses were sig- 
nificantly higher than those between group 2 coronaviruses and 
the group 1 and 3 coronaviruses. Phylogenetic trees con- 
structed using 3CL?"®, Pol, helicase, S, and N all showed that 
the group 2a, 2b, 2c, and 2d coronaviruses are more closely 
related to each other than the group 1 and 3 coronaviruses 
(Fig. 5). These showed that the group 2 coronaviruses probably 
originated from one common ancestor before they diverge into 
the four subgroups, and therefore it would be more logical and 
informative if they are classified as subgroups of group 2 coro- 
naviruses. 

This is the first time that NS7a and 7b downstream of the N 
gene has been observed in group 2 coronaviruses. Previously, 
feline infectious peritonitis virus (FIPV), a group 1 coronavi- 
rus, is the only coronavirus known to possess two genes down- 
stream of the N gene (18). FIPV infects macrophages in a 
variety of tissues systemically, whereas feline enteric coronavi- 
rus (FECV), a coronavirus closely related to FIPV, is restricted 
to replication in enterocytes. It has been found that the FECV 
genome lacks the 300 nucleotides at the 3’ end of FIPV, sug- 
gesting that this region may be important for virulence. Re- 
cently, it has been shown that an isogenic deletion mutant of 
FIPV missing the 7ab cluster protected cats against lethal 
challenge by FIPV, which makes the mutant a potential live 
attenuated vaccine candidate (7). In addition to FIPV, the 
genome of porcine transmissible gastroenteritis virus (TGEV) 
also possesses one gene downstream of N (25). This gene 
encodes a hydrophobic protein that associates with endoplas- 
mic reticulum and cell surface membranes in TGEV-infected 
cells, suggesting that it may have a role in the membrane 
association of replication complexes or assembly of the virus 
(25). In the present comparative genomic analysis, ORFs 
downstream of the N gene were not found in any other coro- 
naviruses other than group la coronaviruses and bat-CoV 
HKU%9 (Fig. 2). While the presence of TRS supports that NS7a 
and 7b of bat-CoV HKU9 are probably expressed, the high 
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Ka/Ks ratio implies that these two genes are under high selec- 
tive pressure and thus are rapidly evolving, which may be due 
to recent acquisition by recombination. Further experiments 
will delineate the function and essentiality of NS7a and NS7b 
in bat-CoV HKU9. 

The huge diversity of coronaviruses is probably a result of 
both a higher mutation rate of RNA viruses due to the infi- 
delity of their polymerases and a higher chance of recombina- 
tion as a result of their unique replication mechanism. Before 
the SARS epidemic in 2003, a total of 19 (2 human, 13 mam- 
malian, and 4 avian) coronaviruses were known. Since the 
SARS epidemic, two novel human coronaviruses have been 
discovered (5, 27, 29). In the past two years, at least 10 previ- 
ously unrecognized coronaviruses from bats have been de- 
scribed in Hong Kong and mainland China (13, 15, 20, 24, 30). 
In addition to the generation of a large number of coronavirus 
species, recombination has also resulted in the generation of 
different genotypes in a particular coronavirus species. This is 
exemplified by the presence of at least three genotypes in 
CoV-HKU1 as a result of recombination (33). The astonishing 
diversity of coronaviruses in bats implies that there are prob- 
ably a lot of other unknown coronaviruses in other animal 
species. Further molecular epidemiological studies in bats of 
other countries, as well as in other animals, and complete 
genome sequencing will shed more light on coronavirus diver- 
sity and the evolutionary histories of these viruses. 
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